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Abstract 

In this work we explore possibilities for coding when information 
worlds have different (semantic) values. We introduce a loss function 
that expresses the overall performance of a coding scheme for discrete 
channels and exchange the usual goal of minimizing the error proba- 
bility to that of minimizing the expected loss. In this environment we 
explore the possibilities of using poset-decoders to make a message- wise 
unequal error protection (UEP), where the most valuable information 
is protected by placing in its proximity information words that differ 
by small valued information. Similar definitions and results are shortly 
presented also for signal constellations in Euclidean space. 

1 Introduction 

Since the mid 1990s, some new metrics were introduced in the study of error- 
correcting codes, mainly metrics determined by a partial order in the set of 
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positions coordinates of linear codes, called for simplicity just poset metrics. 
The relevance of such metrics is being determined considering channels for 
which such metric structures are more appropriate than the usual Hamming 
metric (see [21] and [26]). In this work we generally assume the most usual 
setting of coding theory, the use of linear codes over discrete channels (DC), 
but introduce a new parameter, the value of the information, that turns poset 
metrics into a valuable tool for getting decoders with a good performance. 

As noted by Claude Shannon at the introduction of his seminal work (see 
|23j). the "... semantic aspects of communication are irrelevant to the engi- 
neering problem". In this work, we do not consider the semantic of infor- 
mation, only the possibility of considering its semantic value, something that 
should be defined by experts in the different fields producing information to 
be communicated. Considering such a value function, that associates to each 
information word a non-negative real number, allows us to make a slight but 
relevant change in one of the main questions that drives coding theory: instead 
of searching for codes that minimize the quantity of errors, we can look for 
codes that minimize the overall value of the decoded errorfl 

In this work we establish a general framework for considering value of in- 
formation in coding theory, presenting first some existence results that open a 
wide range of new questions. The introduction of expected loss functions gen- 
eralizes the usual approach of maximum likelihood decoders (ML) and poses 
a new theoretical goal: instead of looking for a code (with given properties, 
such as dimension and length) that minimizes the expectation of the number of 
errors after decoding, we are actually looking for a triple, consisting of a code, 
the way information is mapped into the code and a decoder that minimizes 
the expected loss. 

To deal with such a larger and difficult goal, we bring into the picture a 
family of decoders that are in some sense more manageable, decoders that 
are nearest-neighbor decoders, according to a family of metrics called poset- 
metrics. Considering those metrics we are able to show, in a general setting, the 
existence of nearest-neighbor decoders that beats the performance of classical 
ML decoders. This a posteriori is not surprising since ML decoders answer a 
different question (minimizing the number of errors). Moreover, considering a 

^Concerning the question of semantics, we must stress we do not aim to settle a 
mathematical-theoretical framework that will allow semantical communication, as for ex- 
ample the one being carried by Juba and Sudan (|13j), but we are just assuming that in 
some sense, a semantical value was attributed to information. 
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particular set of poset, those called hierarchical posets, we are able to move 
forwards and determine efficient decoding algorithms (see [8] and |20]). 

The approach adopted in this work goes somehow in the same direction 
that has been followed in some recent works. In the decoding process, the use 
of nearest-neighbor decoders determined by poset-metrics is actually a decod- 
ing process that gives unequal error protection for bits (bit-wise UEP), in a 
similar way as proposed in 1967 by Masnick and Wolf in [16] and since then 
extensively studied by many authors. Considering unequal error protection of 
messages (message-wise UEP) instead of bits is the approach adopted by Bo- 
rade, Nakiboglu and Zeng in [3], when they consider the necessity of protecting 
in different ways information that are different in their nature (like data and 
control messages) or different type of errors (erasures and mis-decoded mes- 
sages), showing the possibility to achieve the channel capacity exponentially 
for some more protected bits (by "stealing" the capacity from other bits). 
The approach given in this work in some sense is more general and combines 
unequal error protection of messages and bits. Moreover, in the approach 
adopted in [5] , more valuable information is over-protected by assigning larger 
decoding regions while in this work the approach to message-wise unequal error 
protection is significantly different: in the former, more valuable information 
is protected by placing in its neighborhood information with similar semantic 
value. 




Figure 1: Message-wise UEP and ordered bit-wise UEP. 

Also, we do not consider an [n; k]^ linear code C just as a subspace of , 
but as an map 5^ : / — )■ F^, where / = F^ may be thought as a source code. 
If we fix such an encoding function (7 : / — )■ F^, we are actually distinguishing 
between g and goa, where cr : / — ?■ / is any permutation of the information set. 
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In this sense, we may say are making a joint source-channel coding (JSCC), in 
the same sense adopted for instance in [9] where some quantized informations 
are more relevant than others. 

As a very simple application, we consider the picture bellow. It is a picture 
in scale-of-gray encoded in the source with 4 bits of information. The infor- 
mation was encoded as the perfect [7, 4]^ binary Hamming code, one codeword 
assigned for each pixel. 



Hello 
World . 



Figure 2: Original picture. 

Using a random number generator, an error was created for each of the 
seven bits of each pixel, with error probability p = 0.3. The same received 
picture was corrected twice, once using usual ML decoder and once using a 



decoder determined by a given poset (details in Appendix 11), which we call 
for the moment just a P-decoder. 

In Figure 3 we can see in a unique different color (purple in the colored 
version) the pixels that were correctly decoded. The pixels that were incor- 
rectly decoded are presented in the (wrong) color they were decoded. On the 
left side we see the result for ML decoder and on the right side the result for 
the P-decoder. 




Figure 3: Right corrected pixels are colored with purple; on the left ML de- 
coding and on the right P-decoding. 
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As expected, the picture on the left is much more color homogeneous 
(purple-like), since using ML to decode with a perfect code minimizes the 
amount of errors. However, one can identify the pixels to be painted in purple 
only when having the original picture. When looking at the picture as it was 
decoded using the two different decoding schemes, one gets a quite different 
perception: 




Figure 4: On the left ML decoding and on the right P-decoding. 

The right-hand image seems to be more sharp, closer to the original picture 
(Figure 2). This perception about the quality of those decoded pictures is an 
example of a way of valuing information, in a situation in which each of us, 
ordinary viewers, may be considered as a kind of expert. 

Despite the fact those picture^ were made considering a very basic model 
for encoding a gray-scale palette of colours, they are a good illustration to the 
main points proposed in this work, including the fact that ML decoding is not 
always better and poset decoders may give better results. 



2 Organization 

Along this work, we study only linear codes and consider transmission either 
on a general but not specified discrete channel or sometimes over a discrete 
symmetric memoryless channel (DSMC). All along the work we assume that 
every codeword is transmitted with the same probability. Although the fact 

^All the pictures illustrating this section were produced using a software developed by 
Vanderson Martins do Rosario, a first year undergraduate student at Universidade Estadual 
de Maringa (UEM) to whom we are in debt. 
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those restrictions are not essential for most definitions introduced in this work 
(except the hnearity of the codes under consideration) we prefer to restrict 
ourselves to this context, since actually dealing with more general channels or 
codes words with different frequencies of transition becomes too intricate for 
this initial approach. 

This work is organized as follows. In Section |3] we recall the basic facts 
about maximum likelihood (ML), maximum a posteriori (MAP) and nearest- 
neighbor (NN) decoders. In Section |4] we introduce the main concepts and 
definitions used in this work: value function, loss function, overall expected 
loss and Bayes decoder. The main result in this section. Proposition [T| charac- 
terizes the expected loss for a DSMC. In Section |5] we describe an analogue of 
Shannon's theorem for valued information (Theorem [T]). After proving those 
general and structural results, in Section [6] we restrict the set of decoders to 
the set of NN decoders relative to the poset metrics. Considering the differ- 
ence between the expected loss of different NN poset decoders we determine a 
simple condition that assures the existence of two nonempty subsets where one 
of those NN poset decoders is better than the other (in terms of minimizing 
the expected loss) and vice- versa (Theorem [2]). In Section [T] we present the 
existence results of this work. The first one states that for any linear code 
and any ML decoder, there are always value functions for which it is better 
to use a non-ML decoder (Theorem |4]). In another result we show that, for 
a large infinite family of pairs (P, Q) of posets (called (/, J) -decomposable 
posets) there are codes for which better results (in term of total expected loss) 
may be attained either by a P-NN decoder or a Q-NN decoder, according to 
the value given to each information (Theorem [6]). On Section [sj we work with 
signal constellations in a continuous channel, defining in a similar way what 
an expected loss function is and showing (Theorem |8]) that ML decoders are 
not necessarily better than other decoders. 

This is not a work that gives answers to known questions, but rather a work 
that aims to show both the convenience and the viability of considering the 
value of information. Inasmuch, many questions that arise are not answered. 
Section [9] is devoted to some final remarks and open problems, but there are 
also some open questions stated along the text, connected to the matter and 
propositions that made them arose. 

In order to make the reading of the work more fluent, we decided to gather 



most of the proofs in Appendix 10 Finally, in Appendix 11, we present the 



details about the coding schemes used to produce the pictures presented in the 



6 



Introduction. 



3 Useful Background: ML, MAP and NN De- 
coders 

Let be the linear space of n-tuples over a finite field and C C F^ be an 
[n; k]^ linear code. Let dfj (-, ■) be the usual Hamming distance: dn {x, y) is the 
number of coordinate positions in which the x and y differ. 

A discrete channel (DC) over F^ is characterized by the set of conditional 
probabilities {P (6| a) : a, 6 e F^} where P {h\a) represents the probability of 
receiving the symbol h given that the symbol a has been transmitted. We 
assume that the channel is memoryless (DMC), that is 

n 

P{y\x) = l[P{y,\x,) 

i=l 

where x = {xi, . . . , Xn) and y = {yi, . . . , yn) represent n consecutive transmit- 
ted and received symbols, respectively. A DMC over Fg is called symmetric 
(DSMC) with crossover probability p if 

P{y\x) = (l-p)«-'^«(-'2/) / 



Considering that a vector y is received through the channel, there are two 
plausible criteria to decide how to decode it. We can decode y as a codeword 



Cy such that 



P{y\cy) =maxP(y|c) 



or we can decode y as a codeword Cy such that 

P{cy\y) ^umxP{c\ y) . 

The first decoding criterion is called maximum likelihood decoder (ML). The 
second decoding criterion is called maximum a posteriori decoder (MAP). Since 
we are assuming that each codeword c is transmitted with probability P (c) ~ 
with M — q^,\t follows from Bayes' rule that both ML and MAP decoders 
coincide with the nearest-neighbor decoder (NN) in DSMC: 

dH(y,Cy) = mindn (y,c) . 

c6G 
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For each decoding criterion above we may define a (generally not unique) map 
a : ^ C such that 

dH iV: a{y)) = min dn {y, c) . 

In general, a decoding scheme (or just decoder) is just a map 

a : ^ C. 

It is reasonable to require that a (c) = c for all c G C and in this situation we 
call a an ordinary or reasonable decoder. Let D (c) be the decision region of c 
relative to the decoding scheme a: 

D{c) := a-'{c)^{ye¥^^:a{y)^c}. 

The decision regions D (c) of a decoder a determine a partition of F^. Given a 
decoding scheme, an error occurs if c is sent and the received codeword lies in 
some decision region D (c'), with c' ^ c. The probability of error is therefore 

where the sum runs over all y G D (c). As the probability distribution of C is 
uniform, the decoding error probability of C is the average 



^e(^) = ^E^e(c). 



c&C 

We let now M+ denotes the set of non-negative real numbers and consider 
the map 

Ato-i : C ^ R+ 

given by 

if c = 

1 if c ■ 



/^o-i (c) 
It follows that 



(^) = ]^ E E /^o-i (« (^) - ^ • 

cec 2/eF" 
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We remark that at this point it is essential to consider C to be a hnear code, 
in order to ensure that a{y) — c E C. The function /io-i is a characteristic 
function that only detect decoding errors, but do not distinguish different 
decoding. We will use the notation /xq-i for any such function, independently 
of the code under consideration. 

In many real situations, it is reasonable to attribute different values to 
different codewords, and this is what will be done in this work, considering 
instead of /io-i value functions that may assume any (non-negative) real value. 
A typical example of this situation is the transmission of digital images illus- 
trated in the introduction: small variations in the color values of each pixel 
does not affect the quality perception of the image. This work is inspired by 
this very common kind of situation. 

4 Value Functions and Expected Loss 

A value function for a linear code C is just a map /i that associates to each 
codeword a non-negative real number 

: C ^ M+, 

and a loss function 

/ : C X ^ M+ 

given by / (c, y) = fi{a (y) — c) gives a measure of the loss when the information 
c G C was send and the information ?/ G was received and decoded as 
a (y) G C. We remark that, since C is linear, the difference a{y) — c is actually 
a codeword hence it makes sense to consider the value fi (a (y) — c). By doing 
so, we are evaluating the errors that may occur during the process consisting 
of encoding, transmitting and decoding information. In such a situation it is 
reasonable to require that /i (0) = (we should not lose anything if everything 
was right along the process). If this happens, we say the value function is 
reasonable. We make this distinction since we will use some "unreasonable" 
value functions that proved to be valuable for proving Theorem |4] in section [7| 
Given a linear code C, a decoder a for C, and a value function /i : C — )■ IR+, 
we define the expected loss of a relative to /i and to a received information y 
to be the average 

Cy (a, ^)=E{l{c,y)) = Y,l (c, v) P ic\y) = (a (y) - c) P ( c| y) . (1) 

cec cec 
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We define the overall expected loss of a as the average of the expected loss 
for all possible informations y e F^, 

E(£(a,/x)) = ^>C,(a,/.)P(7/), 

where P{y) = J2c -P(c)-P(z/|c) is the probability of receiving y. That expression 
can be rewritten as 

E (£(a, //)) = J] ^ // (a (y) -c)P(y\c) P(c). 

ceC 

Making the change of variable t — a{y) — ewe have that 

E(£(a,A^)) = ^G„(T)/.(T) 

where 

Ga{T)=Y,P{y\a{y)-T)P{a{y)-T). (2) 

We remark that E(i2(a,/i)) actually depends on C and should be denoted as 
E (£(a, n,C)), but this dependence on C will be omitted when it should not 
cause any confusion. Also, to shorten the notation and since no confusion may 
arise we will denote 

E(a,/i) := E(£(a,^)) . 

Moreover, since we are considering the value of information, the total ex- 
pected loss depends not only on the code itself but also on the way the infor- 
mation is mapped into the code. In other words, we are actually considering a 
value function R+ where / is a source code. When we say that a code 

C is given we are assuming that it is given an embedding 51 : 7 — >■ C C and 
the function yU : C — )> IR+ is the unique function such that fio g = Jl. 

We say that a decoder a* is a Bayes decoder for C relative to the value 
function n and to the loss function / if for each received information y e F^ it 
minimizes the expected loss, i.e., 

Cy (a*, /i) — min Cy (a, /i) , 

a 

where the minimum is taken over the set of all decoders a of C. 
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Given a DSMC with crossover probability p and an [n; k]^ linear code C 
such that P (c) = for all c G C, we have that 



P{y\c) = {l~p) 
Thus, in expression (|2]), 



, , dH{y,c) 

n-dH{y,c) I y 

q-1 



where 

z = z [p) := 



M 
and 

Dropping the multiplicative scaling factor z in we obtain: 
Proposition 1 For a DSMC we have 

E{a,ii) = J2Ga (r)/i(r) 

where 

5 Shannon's Theorem Analogue for E (a, /i) 

Shannon's coding theorem of 1948 (see [23]) states that for a broad class of 
communication channel models, given 5 > and R lesser than the channel 
capacity, there exists an [n; k]^ linear code with ^ > R such that Pg (C) < 5. 
In this section we state and prove a version of Shannon's theorem for valued 
information on a DSMC. 

Let C be an [n; k]^ linear code. Given value functions /ii, /i2 : C — )• IR+ that 
differ by a constant, /ii = A/i2 for some A > 0, the expected loss functions differ 
by the same constant hence we may say that /ii and fi2 are equivalent. Let [fj] 
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be the equivalence class of the value function The canonical representative 
of the class [/j] is defined to be the value function v G [/j] such that Wi'W^ — 1, 
where denotes the maximum norm 



loo 



ll^lloo ~ niax{i/ (c) : c e C} . 

We can identify 

[V (C)] = /X : C ^ R+ value function} , 

the space of equivalence classes of value function, with the set of canonical 
representatives and consequently with the faces of the cube [0, 1]'^ . The value 
function /xq-i corresponds to a vertex of [0, 1]^ . 

Let Vo (C) be the set of canonical representatives fj, of of reasonable value 
functions on C (i.e. // (0) = 0). Since < // (c) < 1 for all c e C, it follows 
that: 

cec \ye¥^ J 
-E( E l^ia{y)-c)P{y\c)]p{c) 

^E(i- E P{y\c)\p{c). 

cec \ yeD(c) ) 
Thus E (a, /i) is bounded by the decoding error probability: 

E(a,//) <Pe(C). 

Therefore we have a version of Shannon's theorem for valued information on 
DSMC: 



Theorem 1 For a DSMC let C > be the capacity of the channel. For each 
e > , R < C and E Vq there exists an [n; k]^ linear code with ^ > R such 
that E (a, //) < e. 
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Open Problem 1 As was seen in the proof of the preceding theorem, since 
E(a, yu) < E(a,/io-i) = Pe{C), we ask if it is possible to achieve reliable 
communication at rates superior to the Shannon capacity. In other words, for 
a given value function fi, given e > there is a code C such that Kq (a, fi) < e 
(Theorem^. Let [e) be the minimal possible length of such a code, so that 
the code has information rate ^^-^ ( where k is the dimension of the code^ ^ 
Since E(a, yu) < E(a,yUo-i) we have that n^{e) < n^^_^ (e) and we ask for a 
characterization of the value functions for which {e) < n^^^ -^ (e) for every 
e. Moreover, we ask if there is a value function /i such that 

hm — . < 1 

or even 

lim "^(f =0. 

6 Poset Metrics and Expected Loss Differences 

The determination of Bayes decoders is a hard (in terms of complexity) prob- 
lem. In order to have any hope to actually developing a communication process 
that needs, at its very end, a decoding algorithm, we shall consider a particular 
but large class of decoders, the nearest-neighbor (NN) decoders determined by 
poset metrics. Besides the fact of being a metric, those metrics profiteers well 
the structure of linear codes, since they are invariant by translations. As we 
shall explain latter, for many of those metrics there are very efficient decoding 
algorithms available. 

Poset metrics were introduced in the context of coding theory by Richard 
Brualdi et. al. in 1995 (see [6]). Since its introduction in 1995 many con- 
tributions have been established for the theory of poset codes. The works on 
the existence of new classes of perfect codes (see [lOj, [12j), determination of 
identities of MacWilhams type (P, [H]), Wei duality theorem ([IT]), P-MDS 
codes ([H]) and the isometry groups ([E]) are examples of these contributions. 
Some particular families of poset metrics are also studied, the most common 
one is the family of Niederreiter-Rosenbloom-Tsfasman metrics ([3], [12], [2D], 
|21j). since transmission over a set of parallel channels subject to fading and 

•^The decoder that is used to achieve such minimahty is not relevant at this point, only 
the minimality of n^^ (e). 
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the noise process in a wireless fading system (see [2T], are suitable to be 
modeled with such metrics. 

We start defining what a poset metric is. Let [n] := {1,2, ... ,n} be a 
finite set with n elements and let ^ be a partial order on [n] . We call the pair 
P = (['^l ; ^p) a poset. When no confusion may arise we will write simply ^ 
instead of ^p. An ideal in P is a subset / satisfying the following condition: 
if j G / and i ^ j, then i & I. Given a subset X in P, we denote by (X) the 
smallest ideal containing X, called the ideal generated by X. If a; = {xi, . . . , x„) 
and y = {yi, . . . ,?/„) are two vectors in F^, then their P-distance dp {x,y) is 
defined by 

dp {x,y) = \{{i : Xi ^ yi})\ , 

where \A\ denotes the cardinality of A. Since the P-distance is a metric on 
Fg, it is also called poset metric (or P -metric). If P is an antichain order (or 
Hamming order), that is, an order where i ^ j iff i = j, the P-distance is 
just the classical Hamming distance. 

Before we move to look for expected loss for poset decoders, we introduce 
briefiy two families of posets that will be considered along this work. 

A chain order over [n] is an an order where every two elements are compara- 
ble (see figure 5). A Niederreiter-Rosenbloom-Tsfasman {n,m)-order (NRT) 
over [nm] is an order formed by the disjoint union of n chains, each chain 
having m elements (we call m the length of the chain). 

A hierarchical order P over [n] is an order for which there is a partition 

h 

N = U 

<5=1 

such that given i G Ag^ and j G A^., then i ^p j if, and only if, 6i < 6j. If we 
denote \As\ = Is we may say P is an {li, . . . ,lh) hierarchical poset (see Figure 
5). We remark that an (!,...,!) hierarchical poset is the {l,n) NRT-order 
and an (n) hierarchical poset is the chain order. 

We stress that the first poset (with only trivial relations) gives rise to the 
usual Hamming metric dn and the second one may also be viewed as a (1,3) 
NRT order. 

An ordinary decoder ap of an [n; k]^ linear code C is called an nearest- 
neighbor P-decoder (P-NN) if 

dp (y, ap (y)) = min dp {y, c) 
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Figure 5: Hierarchical poset of type (4), (1, 1, 1) and (4, 2, 3). 



for every y G F^. The set of all P-NN decoders associated with C will be 
denoted by Op (C). We denote by O (C) the set of all such decoders, for all 
poset metrics dp in F^: 

O (C) := y Op (C) = {ap : ap is a P-NN decoder of C} . 
p 

We remark that all those are reasonable decoders. 

Decoders in O (C) are called poset decoders. A Q-NN decoder Gq E O (C) 
such that 

K (an, u) = niin F(ap,u) 

is said to be a Poset-Bayes decoder for C relative to the value function /x. The 
determination of Poset-Bayes decoders is a hard problem, since the quantity of 
such decoders (for a fixed code C) grows exponentially with n. Our strategy 
is to consider the difference between the total expected loss relative of pairs of 
decoders in O (C). 

Let P = ([n] , <) and Q = ([n] , <) be posets on [n]. Given a linear code 
C C F^, we consider two P-NN and Q-NN decoders ap^ag : F^ C with 
total expected loss functions (/x) := E(ap,//) and Eq^ (/x) := E(aQ,/x) 
respectively. The total expected loss difference between (//) and E^q (//) is 
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From Proposition [T] it follows that for each < s < 1 (we recall that s 
depends only on the crossover probability p) the total expected loss difference 
can be viewed as the restriction to the positive octant C M^"^ (M = q^) of 
the linear functional \ : M*^ — )■ M given by 

cec 

where 

'^{ap,aQ,c) (^) = Gap (c) - G„q (c) := G^ap,c) (s) - (s) 

and G^^^ ^ ^-j (s) is defined as in Proposition 1 Let us label the codewords 
as C = {co,Ci, . . . ,cm-i}- Assuming that E^p (/i) ^ (/i), ^(ap,aQ) a 
non-null operator, then 



T/' \ (s) = (Tf \ (s) , . . . ,Tf \ (s' 

I ap.an I ^ ' \ I ap.an.a) j ^ ^ ' ' I ap.an .cn r_i I ^ ' 



ui uiiutiuiiai uu i V / 



is a vector in orthogonal to A^/ ^, the kernel of E/ n, and it points 

toward the connected component of — ^ i^^p aq) containing those functions 

a for which E/ n (u) > 0. 

If V denotes the set of all value functions (the positive octant of M^^), it 
can be decomposed as 



V = v+ . u AT. X n V u V7 ^ 

where n and V7 n are the subsets of value functions u G V for which 

\ap,aQ) [ap^aq) 

N (u) > and E/^ \ (u) < respectively. We note that n and 

[ap.aq) \f^> {ap,aQ) \f^> f J (ap.ag) 

V7 N are both non-empty iff N( \ intersect the set of value functions 

V, the positive octant of M^^. Since each /x (c) > 0, a necessary and sufficient 
condition for the kernel N^^^ intersecting the first quadrant of M*^ is that 

at least two coordinates of the normal vector t/ \ (s) (the coefficients of 

[ap,aqj \ I \ 

the linear combination E^^^ (/i)) have different signs. 

With this notation we give the following natural definition: 
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Definition 1 Given an [n; k]^ linear code C , a value function /i for C , and 
decoders ap, aq E O {C), we say that ap is better than aq relative to fi if 

(/^) < 0' 

that is, if ¥.ap (/i) < EflQ {^) ■ 

We note that saying that ap is better than aq relative to /i is just a way of 
emphasizing the meaning of the statement u G Vt n . With the definition 

[ap,aQ j 

and notation above, we have actually proved the following: 

Theorem 2 Let C be an [n; k]^ linear code. Given two poset decoders ap, aq G 
O (C) , then there are value functions for which ap is better than aq and value 
functions for which aq is better than ap iff there are c,d E C such that 




In this case, both VJ % and V, -.are nonempty subsets ofV. 

{ap,aQ) (ap.ag) ^ ^ ■> 

Bayes decoders associated to the value function yUo-i are the classical ML 
decoders (see for example [2], Theorem 4.1.1]). In the context of expect loss 
and restricting the problem to the class of Poset-Bayes decoders, we have: 

Theorem 3 Let H be the Hamming order on [n] and C be an [n; k]^ linear 
code. Then 

E(a^,ap) (/iO-l) < 

for any poset P and any ap E O {C), that is, H-NN is better than P-NN for 
all order P on [n] . Therefore, H-NN decoders are Poset-Bayes decoders for 
the value function /io-i- 

Up to this point there is no real advantage in dealing with poset decoders. 
Such advantages will arise if we can give positive answer to the following ques- 
tions: 

(1) Given a linear code C and the Hamming metric dn-, are there a if-NN 
decoder au and a poset decoder ap such that Vt n and V7 n are 

n f t- {aH,ap) (aH,ap) 

nonempty? A positive answer would means that, for a nonempty set of 
value functions, the poset decoder ap is better than any if-NN decoder 
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(2) Given P and Q posets, are there a hnear code C and decoders ap and og 
of C such that Vt and V7 % are nonempty? A positive answer 

to this question means that every poset decoder is relevant, depending 
on the code under consideration. 

Partial answers to those questions are given in Section [7j 
The following examples illustrate the concepts and questions presented in 
this section. 

Example 1 Let H be the Hamming order and P be the total order 1 <p 2 
3 4 :<p 5. For the [5; 2]^ binary code 

C = {co = 00000, ci = 11100, C2 = 00111, C3 = 11011} 

and appropriate decoders and ap of C we have 

Ea^ (/x) = (4 + 20s + 8s^) /i (co) + 
(l2s2 + 12s3 + 8/)/i(ci) + 
{I2s^ + 12s=^ + 8/) /i (C2) + 
{8s^ + 16s^ + As* + 4s^) /i (cs) , 

(fi) = (4 + 10s + lOs^ + 6s3 + 4s^) (cq) + 
(6s + 14s2 + lOs^ + 2s^) fx (ci) + 
(2s + 6s2 + lOs^ + lOs^ + 4s^) /i (C2) + 
(2s + lOs^ + 14s^ + 6s^) (C3) , 

/ience 

E(a^,ap) (/i) = (lOs - 2s2 - 6s=^ + 2S^) /i (Co) + 
(-6s - 2s2 + 2s3 + 6s^) /i (ci) + 
(-2s + 6s^ + 2s^ - 2s* - 4s5) n (ca) + 
(-2s - 2s2 + 2s3 - 2s^ + 4s^) fx (cg) . 

Since for each < s < 1 

T(aH,ap,co) (s) = lOs - 2s^ - 6s' + 2s* > 
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and 

Tia„,ap,c,) (s) = -6s - 2s^ + 2s^ + Qs^ < 0, 
it follows from Theorem 2 that both V/^ < and V7 -.are nonempty subsets 

J J |_J (aH,ap) {aH,ap) ^ ^ 

of V , that is, depending on the value function, the P-NN decoder ap may be 
better or worse then the usual H-NN decoder a^. 

Example 2 Let us now consider the repetition code {cq = 000, Ci = 111}. Al- 
though trivial, this is an MDS perfect code. Considering the ML decoder an 
and a poset decoder ap determined by the poset P defined by the relations 
1 2 3 we find that 

^{auAP) (/^) = (2s - S^) fl (000) + {-2s + S^) fl (111) 

hence 

V(t,a,) = {/^:/^(000)>/.(lll)} 

and 

V(;„.,) = {/^:/^(000)</x(lll)}. 

7 Relevance of Decoders and Codes 

In this section we show that in quite general instances, every encoding and 
every decoder may be relevant, depending on the value functions to be consid- 



ered. All proofs are postponed to Appendix [TO 

We start with the result which shows that for any linear code and any ML 
decoder, there are always value functions for which is better to use an non-ML 
decoder. 

Theorem 4 Let C be an [n; k]^ linear code and a an ML decoder. Than, there 
exist a decoder a and value functions /i and Jl such that 

E (a, jj,) >E (a, ji) 

and 

E (a, /i) < E (a, Jl) , 
for any given discrete channel. 
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In the proof of this theorem we make use of a decoder oq : — )■ C such 
that ao (y) = for all y and of a value function /ii_o defined by /ii_o (0) = 1 
and /ii_o (c) = if c 7^ 0. Both this decoder and the value function are not 
reasonable ones. 

We can make some progress concerning the question of reasonable decoders 
considering a discrete channel. Before proceeding, we given an [n; k]^ linear 
code and ?/ G we define aig {dn {C,y)) as the set of all codewords of C 
closest to y in the usual Hamming metric: 



arg 



{dn {C,y)) = i^c e C : dn (y, c) = min dn (y, 0) | . 



Theorem 5 Let C he an [n; k]2 binary linear code and ?/ = (1, 1, . . . , 1). // 

\axg{dH {C,y))\ > 1, there are ML decoders an and an of C such that V^^-^-^ 
and are both nonempty. 

Let us state a important class of linear codes that satisfies the condition of 
the Theorem [5l 

Corollary 1 Let dn be the Hamming metric on Fg and C be an [n\ k],^ binary 
linear code of constant weight w. If k > 1, then there is a P-NN decoder ap 
and an H-NN decoder au of C such that Vt ^ and V7 are nonempty. 

Open Problem 2 Does Theorem^ still hold if we impose the use of reason- 
able decoders and value functions? In Theorem^ we considered a reasonable 
decoder (every P-NN decoder is reasonable), but had to impose some restric- 
tions on the code. Is it possible to rule those conditions out? We do believe 
the answer to those questions is positive, but were not able to prove it. 

Up to this moment we were considering a given (and fixed) code, and 
showed there are value functions for which it is better (in the sense of mini- 
mizing the expected loss) to decode using a non-Hamming decoder. Now we 
fix two different posets, one of them a poset P that satisfies a special condition 
and the usual Hamming poset H and show the existence of a code for which 
both N and Vr n are non empty. 

Before stating the results we need some definitions. Given an order P = 
(N 5 <p) the dual order P* = {[n] , <p*) is defined by the opposite relations: 
X <p. y y <p X. For simplicity, we shall omit the indices in <p and <p. 



20 



when no confusion may be caused. We remark that (P*)* = P, An ideal in 
P* is called a filter in P. 

Given a nontrivial and proper filter / in P and 7^ J C /, we define 

Ij:={iEl—J:i>jfoT some j G J} 

and 

Ij := {i E I — J : i < j for some j E J} . 
We will say that a filter / of P is J -decomposable if 

/ = /+uJu/7 

is a partition of I with both Jj" and Ij nonempty. If there exists a filter / in 
P that is J-decomposable, we will say that P is (/, J)- decomposable. 

Let now {ei : 1 < i < n} he the usual base of F^. For each nonempty subset 
X C [n] let 

Cx = span {cj : 2 G X} 

be the coordinate subspace with support in X. Given y = J2i=i Vi^i ^ 
denote by yx its projection onto Cx'- 

yx = ^Viei. 

iex 

Given an (/, J)-decomposable order P, with |/| = K and \J\ = k, it 
determines an [n; K — k]^ linear code C(/^j) that is just the coordinate space 
C/-J, i.e., 

J) = span {ci-Ael - J } . 

We name those subspaces as a BGL code, after the description of perfect codes 
given by Brualdi, Graves and Lawrence in 1995 (O Theorem 2.1]). 
The complement of a subset X C [n] is denoted by X'^. 

Theorem 6 Let P = {[n],<p) be an {I, J)-decomposable order and H = 
{[n],<H) be the Hamming order. Considering the |-^| — |^|]q BCL code 
there are NN decoders and ap of C and codewords c, c' G C*(/_j) such 
that both ^ and V7 ^ are non empty for every < s < 1. 

In general is not easy to compute the polynomial T(^a„,ap,c) (s). However, for 
appropriate P-NN decoders of C(/^j) it is possible to determine T(^aH,ap,c) i^)'- 
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Corollary 2 Consider an {I , J)-decomposable order P on [n]. For the BGL 
code Ci^i^j) and for the P-NN decoder ap determined in Theorem^ we have 
that 

T, ^ ( q\ — o^Hiy,c) _ dHiy,c-c) 

for every c G C(^ij) . 

It is easy to see that the class of (/, J)-decomposable orders includes the 
(ra, m)-NRT poset for m > 4, hence the following corollary holds: 

Corollary 3 Let H he the Hamming order on [nm] and let P he the NRT 
{n, m)-order on [nm] . Then, for m > 4 there exists an [n; k]^ linear code C and 
c,c' & C such that 

for some H-NN and P-NN decoders a^j and ap of C respectively. Therefore, 
Vt. r. \ o.'iT'd Vr „ are nonempty. 

If we consider the particular case when P is the (l,m)-NRT order and 
m > 4, then every filter J of J with |/| > 3 is decomposable: given / = 
{m — k + 1, . . . , m}, then 

^ = I{m-k+j} U{m~k+j}U 
is a non trivial partition of / and the following holds: 

Corollary 4 Let H he the Hamming order on [m] and let P he the NRT (1, m)- 
order. If m > 4, then for each 2 < < m — 2 there exists an [m; k]^ linear 
code C such that hoth V^^^ and V^"^ are nonempty for some H-NN and 
P-NN decoders an and ap. 

We remark that, in the proof of Theorem |6} the only property of the Ham- 
ming poset H we used was the fact that {y) = Vi-j is an if-NN decoder. 
Actually it is true for any poset on [n] that is (/, J)-decomposable. It follows 
that the result obtained in Theorem [6] also holds for any such pair of posets. 
Let P and Q be a pair of orders on [n] . Suppose that P is (/, J)-decomposable. 
We will say that P is (/, J) -isomorphic to Q if I in Q is still a filter. In this 
condition / is also J-decomposable on Q. 
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Theorem 7 Consider proper subsets ^ ^ J G I G [n]. Let P and Q be posets 
on [n] such that I is filter in both P and Q. Then, if both P and Q are (/, J)- 
decomposable there is an [n; k]^ linear code C and NN decoders ap and aq of 
C such that Vt n and V7 % are nonemptifl 

[ap,aQ) {ap,aQ) "|J 

Open Problem 3 We do believe that the {I , J) -decomposable condition in 
the statement of Theorem is not necessary, as much as the condition that 
the channel being symmetric. The right question we believe should be posed 
is the following: to find necessary and sufficient conditions on two posets that 
guarantee the existence of a code that may be better corrected by using either 
the poset metrics (depending on the value functions). 



8 Value Functions for the Continuous Channel 

The concept of expected loss defined for a discrete channel can naturally be 
adapted for continuous channels. We do not go as further as in the discrete 
channel case and restrict ourselves to giving appropriate definitions. 

Let S = {si, . . . ,sm} be a finite signal constellation on the Euclidean 
A^-dimensional space M^. We should now proceed to introduce value to the 
signal. In a manner of fact, in a situation similar to that developed for the 
discrete channel, we are actually valuing the errors (after decoding), what was 
not totally evident in the discrete case since we considered just linear codes, 
hence every error (after decoding) is a codeword. For this reason we consider 
the difference set 

AS := S — S = {si — Sj : Sj, Sj G S} . 
A value function for the constellation S is any function 

fi: AS ^ M+. 

Consider the continuous channel defined by the family of probability den- 
sity functions p{y\x), with x,y G M^. We define the overall expected loss of 

"'BGL codes have some nice properties, including the possibihty of expressing the packing 
radius as a function of the minimal distance, what is not an easy task for general codes and 
posets. 
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S relative to the value function /j, : AS — >■ R+ and decoder a : — >■ 5" as the 
average 

E{C{a,fj,)) ^ Cy{a,fj,)p{y)dy 

where 

= ^/^{(J'iy) - Si)p{si\y)dy 
sies 

is the expected loss for an observed y. As in the discrete channel case, E (a, /Xq-i) 
coincides with the decoding error probability (S) of S. As in the discrete 
case, we denote E (£ (a, /i)) simply by E (a, /i) 

Also the overall expected loss E (a, /x) can be interpreted as the restriction 
of a linear functional with domain Rl^'^l into R+: 

E (a, //) = ^ G„ (r) // (r) 
TeA5 

with 

X] / p iy\ Si) p i^i) dy 

where R (sj) = a^^ (sj) is the decision region of the signal Sj. 
Now consider the difference of the expected losses 

E(a,a) (/^) = E (a, //) - E (a, //) 

relative to the value function n and the pair (a, a) of decoders of S", we have 
that 

E(a,a) (/.) = 5] T(„,5) (r) (r) 

reA5 

where 

(r) := Ga (r) - Gs (r) 

for each r G AS*. 

With the definitions properly established, it is possible to prove that ML 
decoders on M^, determined by the Voronoi regions, are not always the best 
decoders. More generally: 
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Theorem 8 Let S = {si, . . . ,sm} be a signal constellation in such that 
for some r G there is a unique sj — Sj G AS such that Sj — Si = r . Consider 
a decoder a : — )■ 5* for S and assume that each decision region of the decoder 
a has non-empty interior. Then there is another decoder a : — )■ 5* for S 
such that 



In this final section we present some remarks concerning many aspects of coding 
theory that either demand a different formulation in the context of expected 
loss or raise interesting problems we believe are deserve to be explored. 

9.1 Remarks about Poset Decoders 

Poset codes were given a distinctive position in this work, but its actual impor- 
tance was not truly explained. The first motivation to consider poset decoders 
is the fact that some of them admit efficient algorithm decoding, what is a 
deep contrast with the usual setting of ML decoders case, where finding gen- 
eral decoding algorithms is known to be NP-complete (see Indeed, for an 
(n)-hierarchical poset (or equivalently, an NRT (1, n)-order), the kind used to 
produce the right-side pictures in the Introduction, decoding algorithm is lin- 
ear in the co-dimension of the code [201 Section IV-D] . Besides those posets, for 
a general hierarchical poset, there are algorithms that are at least as efficient 
as syndrome decoding and with high probability significantly faster [S]. If an 
(n)-hierarchical poset is unique (up to order isomorphism) for any n G N, the 
hierarchical posets in their generality are a large family, corresponding to or- 
dered partitions of n, hence there are ~ 2^ such posets, what should provide 
many possibilities in each code length. 

When considering the dimension k and the length n of a code as given, the 
usual task of error correcting is to find a code with better performance. If this 
is already an untractable computational problem, the goal posed in this work 
is much more complex: finding a pair, consisting of a code and a decoder. 
Here comes another reason to restrict ourselves to poset decoders, or more 
specifically, to hierarchical poset decoders, since there is an heuristic approach 
to find better results for the expected loss function: since coordinates that has 
large value (it means, is large) are best protected, we should make a code 




9 Final Remarks 
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such that the more relevant information is concentrated as non-null entries in 
those coordinates and to define a poset that has those coordinates as maximal 
elements of the poset. 

Open Problem 4 Is it possible to prove that under suitable conditions this 
kind of heuristics will work? To be more explicit. Suppose there are M = 
informations A = {xi, . . . with fi (xi) ~ A* for some constant A > 1. Is 

it true that given an [n, k]^ linear code C it is possible to find an ap decoder 
determined by an NRT (1, n)-order P for which E (ap, /x) < E (a^/, /x) for every 
Hamming decoder an ? Can we make the same statement when C is a perfect 
or MDS code? More generally, if the information set can he partitioned as 



with fi (xj) ~ A* if Xj e Ai, should we use a decoder determined by an hierar- 
chical poset? 

9.2 Remarks about the Space of Codes and Decoders 

The introduction of value functions and the need to consider decoders that 
may not be ML decoders enlarges considerably the space where we are actually 
working. We can assume as reasonable that the quantity q'^ of information and 
the value of the information are given, depending on the application and the 
kind of knowledge the information constitute. If we suppose for instance that 
the cost of transmission is determined by the length n (the same would hold if 
a maximal bound was established for the expected loss), we are looking for a 
pair consisting of an [n; k]^ linear code and a decoder associated to the code. 
Moreover, we are not only interested in the code C C F^, but actually how 
the set of information A is mapped onto C, so we are actually considering a 
code not only as a subset C C but as an embedding in F^ . In other words, 
we should consider not only the subset C C F^ but also all the permutations 
cr : C — 7- C. In this sense, the pair (code, decoder) where we are searching for 
possibilities to minimize the expected loss function is a space with 



r 



i=l 




k=l 
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elements, where the first factor corresponds to the cardinahty of the Grasman- 
nian G {n, k), the second to the permutations of total quantity of C and the 
last one to the decoders of a given [n; k]^ code (including the unreasonable 
ones). 

Open Problem 5 To estimate asymptotically, the quotient 



DP, 



n 



where DPn is the number of NN poset decoders may be an interesting question 
for itself. There is no known estimative of DPn but for Pn, the number of 
posets on a set of n elements, the exact asymptotic is known f25^ : for odd n 




9.3 Remarks about Bounds for Expected Loss 

The error probability function Pe (C) is one of the fundamental parameters to 
measure de performance of a coding scheme. Despite the fact it has a simple 
formulation, 

actual calculations are generally hard problems. For this reason, finding good 
(lower and upper) bounds is a fundamental question. Among the well known 
such bounds we can find union bound, Bhattacharyya, Gallager, Caen and 
sphere packing (see [22] )• 

Considering the total expected loss, we already saw that the error proba- 
bility Pe (C) is an upper bound for E (a, /i), but it is obviously far from being 
a tight one. Calculating E (a, fi) it is not only prohibitive, but involves many 
parameters that should be treated separately. 
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Open Problem 6 Consider a fixed family of value functions and search for 
upper bounds for E(a,yu). One relevant family that may be interesting for 
protecting information of different nature (as done in f^) suggests to consider 
a value function /iq-a-i such that C can be partitioned as C = Cq U Cx U Ci 
where yUo-A-i (c) = j if c E Cj . More general situations are found where C is 
expressed as 

C = CoUCiU...UCr 

and the value function is either linear (fii) or exponential (^e), in the sense 
that 

jiL (c) = if ce Cj 

where b > 1 is a constant. We remark that the fractions ^ and ^ are just 
scaling factors. 

Open Problem 7 Considering a family of value functions concerns aiming to 
produce coding schemes for families of applications with similar semantic value, 
and this is a data that is determined by the practical (or theoretical) problem. 
If instead we look at the way we are able to manage, the natural question would 
be to find bounds for the expected loss when considering a particular family of 
NN poset decoders, specially those determined by hierarchical posets. 

Finally: 

Open Problem 8 Both the problems presented above are still very hard, in 
each of them there is one free parameter we do not find in the classical case 
where both the value function (fio-i) and the decoder type (ML) are fixed. So we 
can combine the two previous problems and ask to find bounds for the expected 
loss function fixing a family of value functions and a type of NN decoder. 

9.4 Remarks about Rate Distortion Theory 

The basic question concerning rate distortion theory, as posed Kolmogorov 
(|15j) and Shannon ([21]) is: given a source distribution and a distortion mea- 
sure, what is the minimum rate description required to achieve a particular 
distortion? For expected loss function, the question may be stated as follows: 
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Open Problem 9 Let \_x\ denote the integer part o/x G M. Let X he a set 
of information and fi a value function defined on X. Given a loss E, what is 
the maximal information rate R for which there is an [[-|J ; linear code C 
and a P-NN decoder ap of C such that 

E(ap,/i) < E? 

The basic definitions of rate distortion theory (see [7j) can be re-stated in 
the context of value functions and expected loss. 

Definition 2 Let I = {xi, . . . , Xk} be an information set and a value func- 
tion on I. We say that the rate loss pair {R, E) is realizable if there is an 
\\Jk\ ^^''^^(^''^ code C and a P-NN decoder ap such that K {ap, fi) <E. The 
rate loss region of I is the closure of all realizable rate loss pairs. The rate 
loss function R {E) is the maximum R such that {R, E) is in the rate loss 
region of /. The capacity of the channel to transmit information from X 
given the value function ^ is 

Ca = lim R (E) . 

Open Problem 10 Determine for a family of value functions, restricted 
to a family of poset decoders. 

10 Appendix 1: Proofs 
10.1 Proof of Theorem ID 

Let C be an [n; k]^ linear code and /ii_o : C — t- IR+ be the value function such 
that /ii_o (0) 7^ and /ii_o (c) = for all 7^ c G C. Let us consider the 
decoder : — )■ C such that oq (y) = for all y E The total expected 
loss E (ao, /ii-o) may be determined without utilizing the expressions in Section 

a 

When a codeword c G C is transmitted, a word yc is received and it is de- 
coded as ao (yc)- We remark that this decoding results in a loss /ii_o (ao (yc) — c). 
However, ao (Vc) = for every yc hence the loss is just /ii_o (0 — c) = yUi_o (— c). 
But /ii_o (— c) 7^ iff c = and since we are assuming codewords are to be 
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send with probability equal to —r we find that 

E(ao,/ii.o) = 7 — 

and this does not depends on the channel. 
Given a decoder a we have that 

E (a, /xi_o) = (r) /ii_o (r) 

rec 

= G', (0) /ii.o (0). 

Considering a discrete channel determined by the set of conditional probabili- 
ties P {y\x) we have (as in expression (pi)) that 



hence 

Ga{0)=J2P{y\a{y))P{a{y)). 
Assuming that the probability distribution P (c) of C is uniform, we find that 



Considering an ML decoder a, we have by definition of ML decoder that 

P{y\a{y))>P{y\c) 

for every c G C so that 

Y.P{y\a{y))> 5^P(y|c) = l (3) 

for every c G C. Since for c G C and y ^ (c) we have that 

P{y\a{y))>P{y\c), 
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we find that inequality rtSl) is actually strict: 



5^P(y|a(y))>l. 



(4) 



So, since 



E (a,/ii_o) 



G~a (0) /ii.o (0) 



4$^P(l/|S(i/))/xi.o (0), 



(5) 



by Q and ([s]) we have that 



E (a,/xi_o) > 



/^i-o (0) 



and since E(ao,/ii-o) = ^^"Ji we conclude that 

E (a,/ii_o) > E (ao, Ati-o) • 

To finish we just consider the decoder a = above defined, /i = /io-i and 
= /^i-o- 

10.2 Proof of Theorem [5] 

Let dn be the usual Hamming metric and let Ci,C2 G arg (c?// (C, y)) with 
Cl 7^ C2. Define a// (y) = ci and let a/^ be defined by 




With this definitions of an and an we find that 



dn {y,aH (y) - Cl) = n, 



d-H {y, an (y) - Cl) = dn {y, cs - ci) , 
dn {y, an {y) - C2) = dn {y, ci - C2) 
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and 

dn {y,aH {y) - C2) = n. 
Setting m = du (jj, C2 — ci) = dn {y, ci — C2) we get 



and 

T{aHAH) (^2) = - S". 

Since Ci 7^ C2, we have that m < n and hence 

7'(aH,SH) (Cl) < < T(„^^5^) (C2) 

for every < s < 1. 



10.3 Proof of Corollary [I] 

Let y = (1, 1, . . . , 1). Since k > 1 and C has constant weight we have that 
y ^ C. If 7^ c G C is a codeword, since C has constant weight w it follows 
that 

du {y, c) = n — wh (c) = n — w 

and consequently arg {dn {C,y)) = C — {0} and \a.Tg{dH {C,y))\ = 2^ — 1. 
Since we are assuming k > 1, we conclude that \arg{dH {C,y))\ > 1 and the 
result follows from Theorem IH 



10.4 Proof of Theorem [6] 

In this proof the complement of a subset X of [n] is denoted by X'^. 
A vector y G can be decomposed as 

y = yi- + yj + yi-j 

where y/c, yj and are the projections of y in the coordinate subspaces 
C/c, Cj and C(/ j) respectively. From this decomposition follows that 

du (y, c) = Wh (yic) + Wh (yj) + dn (yi-j, c) 
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for every c e C(^i^j). It follows that 

arg inin (y, 6*) = arg inin {w^ (y/c) + (yj) + dn {yi-j, 0)} 



arg mill dniyi-j^O) 

yi-j, 



hence an (y) = yi-j- 
We claim that 



flp {y) = yi-j 



for every y G Cjc, that is, for such an y we have that yj = 0. Indeed, this 
happens because 



dp {y, c) 
It follows that 



{supp {yi.) U supp {yi_j - c))\ if yi_j 
\{supp {yic))\ if c^yj_j 



ap (y) = arg min o?p (y, 9) = 



hence 

ap (y) = Oij (y) 

for y eCjc. 

We should now define a P-NN decoder for y ^ Cjc So, we consider y — 
yi" + yj + yi-j with yj ^ O. if J' — supp (yj) and yj+ is the projection of y 
on Cj+, then 

yi- + yi+ = arg min dp (y, ^) 

for every G C^-. At this point we should note that the if-NN decoder 
an already defined is also a P-NN decoder, since yi-j — yj- + yj+ for some 
yj- G Cj- . But this will not serve to our purpose, since in this case the total 
expected loss difference is 0. However, there are other possibilities for a P-NN 
decoder and we will define ap in such a way that 

T(aH,ap,c') (s) < < T^aH,ap,c') («) (6) 

for some pair c, c' G C(^i^jy 
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Let 

y = Xjc+ ej 

with x/c G Cjc and ej E Cj. Consider 

CG Cj- 

with c 7^ 0. In this situation {supp (jj —c)) = {supp {y)). Thus ap (y) := c is a 
P-NN decoder for We conclude defining ap {y) = yi-j ioi y ^ y ^ Cjc 

We now choose vectors ci,C2 G C*(/^j) that will ensure condition ([6|. We 
define 

Ci = ^ Ci G C(/,j) 
ie/+ 

and 

C2 = c. 

Since a// (j/^ = and ap (y) = c, we have: 

rii := c^H (y, a// (y) - ci) = dn {y, - Ci) = (y) + 

and 

"^1 := dn {y,ap (y) - Ci) = dn {y,c- ci) = WH{y) + wh (c) + |/j | , 

hence ni < mi. 
Moreover, 

^^2 := dn {y, an (y) - C2) = dn {y, - Ca) = wh (y) + wh (c) 

and 

m2 := c^H (y, ap (y) - C2) = dn {y,c- C2) = Wh (jj) , 
and hence ?t,2 < m2. By Proposition [T] 



and 

G(ap,c,) (^) = + E 
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As T(^aH,ap,c,) (s) = G(^aH,c,) (s) - G(^ap,c,) (s), ^ = 1, 2, and an {y) = ap (y) for 
all y ^y, we obtain 

and 

Therefore T(^aH,ap,c2) (s) < < T(^aH,ap,ci) (s) for all < s < 1 and the result 
follows from Theorem [2] 

10.5 Proof of Theorem E 

Let R{si) ,. . . ,R{sm) be the decision regions of the decoder a : — )■ S. 
Consider a partition 

of R (si) U R (sj), different from the partition {R (sj) , R (sj)}, and such that 
R{si) = R{si) U Sj for some open subset Sj C R{sj) with Sj ^ S'j. It is 
obvious that such a partition exists since the decision regions of a has non- 
empty interior. Under those conditions we have that R{sj) = R{sj) — Sj. 
Let 

a:R^ S 

be the decoder of S determined by the decision regions 

{r (si) , . . . , R{^), Ri^), ...,R (sm)} U (si) , R (s,)} . 

Since t = sj — Si admits a unique solution on AS*, the same holds for — r = 
Si — Sj and we find the following: 

Ga{sj-Si)= p{y\si) P {si)dy, 
Gu {sj - Si) = p{y\si)P (si) dy, 

jR{sj)-Sj 

Ga{si-Sj)= p{y\sj) P {sj)dy, 
Jr(s,) 



and 



Gz{s^-Sj)= p{y\sj)P{sj)dy. 

'R{s,)USj 
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It follows that 




P {y\ Si) P {si) dy > 



and 




P{y\ Sj)P{si)dy < 0, 



as desired. 



11 Appendix 2: Details about "Hello World" 
Enconding Scheme 



In the introduction of this work we simulated the transmition of the scale-of- 
grey image of the words "Hello World" (Figure 2) through a binary memorylcss 
channel with crossover probability p = 0.3 and decoded the received word 
twice, once using the ML decoder and once using a P-NN decoder (Figure 4). 
The poset P we used was the total order defined by the relations 1 ^ 2 :< 
. . . :<7. We now describe in details the codification process. 

The code itself is a [7; 4] 2 binary Hamming code, but in the encoding process 
not only the code as a subset is important, but also the particular color that is 
attributed to each codeword. The choice of the encoding is intimately related 
to the nature of the information and the characteristics of the P-NN decoder. 
We assumed that in the transmitted images the darker tones of gray carries the 
more important information, the tones used in the letters. Since the P-decoder 
is more susceptible to errors in the last coordinates we associated the darker 
tons of gray to codewords that has nonzero entries in the last coordinates (7 and 
6), the middle range of grays to the codewords that has non-zero coordinates 
in the intermediate coordinates (5, 4 and 3) and the lighter tones of grays in 
the remaining positions. The actual association is shown in the following table, 
where the tones of gray are described by the scale in the RGB palette. Since 
the actual meaning of each information is relevant to decide where to place 
it as a codeword, we may say we are adopting a message-wise UEP encoding 
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scheme. 



RGB 


codeword C; 


value a (c,,) 


(101. 101. 101) 


Ci5 = 1111111 


1.00 


(102, 102, 102) 


ci4 = 0001111 


0.90 


(103, 103, 103) 


ci3 = 0010011 


0.89 


(104, 104, 104) 


ci2 = 1100011 


0.88 


(105,105,105) 


cii = 1010101 


0.87 


(106,106,106) 


cio = 0100101 


0.86 


(107, 107, 107) 


Cg = 0111001 


0.85 


(108, 108, 108) 


Cg = 1001001 


0.84 


(109, 109, 109) 


C7 = 0110110 


0.83 


(110,110,110) 


C6 = 1000110 


0.82 


(187,187,187) 


C5 = 1011010 


0.50 


(188,188,188) 


C4 = 0101010 


0.40 


(189, 189, 189) 


C3 = 0011100 


0.30 


(190, 190, 190) 


C2 = 1101100 


0.20 


(191,191,191) 


ci = 1110000 


0.10 


(192, 192, 192) 


Co = 0000000 


0.00 



Considering this encoding and the values listed in the previous table, we 
can list all the 32 polynomials Gap (cj) and Ga^ (cj), i = 0, 1, . . . , 15, asso- 
ciated to the expected loss functions E(ap,yu) e E(ai^,/i) respectively. We 
remark (without proving) that for the Hamming case the polynomial Ga^ (t) 
depends only on the weight wh (t) hence it is sufficient to know the polyno- 
mials GaH (co), (cis), Ga^ (cu) and Gan (cis): 

16 + 112s 
48^2 -I- 165^ + 64s^ 
64/ + 16/ + 48s^ 
112/ + 16s^ 

As for E (op, //) we have that: 

Gap{co) = 16 + 39s + 39/ + 25/ + 9/ 
Gap{ci) = 25s + 57/ + 39/ + 7/ 
Gap{c2) = 10s + 42s^ + 54s^ + 22s^ 



Gan (co) = 

GaH (<^13) = 

Gan (C14) = 

Gan (^15) = 
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Gapics) = 6s + 22s2 + 425^ + 42s^ + 16s^ 

Gap{ci) = 7s + 395^ + 575^ + 25s^ 

Gap (05) = 9s + 255^ + 39s^ + 39s^ + 16s^ 

Gapice) = 16s2 + 425^ + 42s^ + 22s^ + 6s' 

Gapicy) = 22s3 + 54s^ + 42s^ + lOs^ 

Gapics) = 10s + 42s^ + 54s^ + 22s^ 

Gapicg) = 6s + 22s2 + 42s^ + 42s^ + 16s^ 

Gap(cio) = 16s2 + 39s^ + 39s^ + 25s^ + 9s' 

Gap{cu) = 25s^ + 57s^ + 39s^ + 7s^ 

Gap{ci2) = 16s^ + 42s2 + 42s^ + 22s^ + 6s' 

Gap{ci3) = 22s3 + 54s' + 42s^ + lOs^ 

^^^(014) = 7s^ + 39s' + 57s^ + 25s^ 

Gapicir,) = + 25s' + 39s^ + 39s^ + 16s 



In the following figure we can see the graphs of the differences T(^ap,aH,c) i^)- 



Figure 6: Difference between overall expected functions. 



Computing the difference between the overall expected functions relative 
to the value function function of s we get: 

E(ap,a^) (/i) = 27.10s - 58.34s2 - 58.79s^ + 58.97s' + 40.33s^ - 9.27s^ 

The graph bellow shows us that for s > 0.4 (equivalently, for p > 0.29) decoder 
ap performs better then an- 

Now we come back to the "Hello World" picture. Since the darker tones 
of gray are represented by codewords that have at least one of the last two 
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Figure 7: Graph of E(^ap,aH) (/^) ^ ^ function of s. 



coordinates with non-zero entry and since the probabihty of occurring an error 
in one of the first five entries is higher than in the last two ones, when you 
transmit a dark gray message, there is a higher probabihty that the P-NN 
decoder decodes the received message as a dark tone of gray that may be not 
the correct one, but looks like the original message (see Figure 4). In other 
words, the last two coordinates are more protected than the others for decoding 
in scale-of-gray. In this sense we are making a bit-wise UEP decoding. 




Figure 8: Each image contains 6400 pixels. The original message was the dark 
gray (101, 101, 101) in RGB; on the left we used ML decoding and on the right 
the P-NN decoding. 

Of course a repetition code could attain similar results, but in order to 
get a similar quality of the "Hello World" picture under severe transmission 
conditions (crossover probability 0.3 < p < 0.45), the rate of information 
would be much smaller than the rate achieved in this case. In Figure 9 we can 
see that even under a very high error probability {p = 0.4 and p = 0.43) that 
it is possible to grasp something of the original message. 
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Figure 9: Image "Hello World" decoded after being transmitted through a 
BSMC with crossover probability p = 0.4 and p = 0.43 respectively. 
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